2 - Musteranalyse/Pattern Analysis (früher Mustererkennung 2) (PA) [ID:377]

50 von 607 angezeigt

Welcome to the Tuesday session.

Yesterday we were considering at the beginning a topic that most of you are familiar with.

We talked about the Bayesian classifier, the Bayesian decision rule.

By maximizing the a posteriori probability we are able to build a classifier that is

optimal with respect to the average loss that you get if you have a 0, 1 loss function.

We have shown also a little proof, it's very small and intuitive proof for this type of

fact.

And then in the second chapter we started to look at logistic regression.

What we did is we applied a very simple trick.

We just rewrote the posteriori using the Bayesian decomposition of the posteriori probability

with a known ratio.

And then we did a little trick.

We divided by the denominator, by the denominator and then we got this expression and then we

applied the exponential function and the logarithm.

So we mapped things back and forth and we ended up finally with a representation that

is the sigmoid function.

So the a posteriori probability can be rewritten by a very basic arithmetic operation as a

function that looks like that.

That's called the sigmoid function or the logistic function.

That's 1 over 1 plus e to the power of minus f of x.

And then we have shown yesterday that the function f of x is equal to 0 defines our

decision boundary.

So if you have a classification problem where you have two classes

and you have a decision boundary like that, it's exactly f of x is equal to 0.

That's the implicit representation of this.

And if you have to write down the posteriori probability of the two classes, you know the

posteriori is 1 over 1 plus e to the power plus minus f of x, depending on which class

you are considering.

And the sigmoid function, for those of you who attended pattern recognition in winter,

that's something that is used in neural networks.

The sigmoid function is used in neural networks, heavily used in neural networks, and we will

see later that logistic regression and the standard perceptron are doing pretty much

the same.

So then we looked at the derivative of the sigmoid function, and it has the nice property

that the derivative is just the function times 1 minus the function.

Also very nice property, and we will reuse it later on when we will compute derivatives

to estimate the parameters of the function f of x.

This is how these sigmoid functions look like.

It's more or less a step function, approximating a step function.

Dependent on the choice of the prefactor, you can get closer and closer to just a sharp

step function.

Now we looked at the decision boundary, and then we started out to look at an example.

And this is a very, very important example that we have considered yesterday.

Let's assume the class conditional probability, so p of x given the class, is represented

by a Gaussian.

The formula for Gaussian, you should be able to write it down by heart.

It's not a good strategy to show up in the oral exam without being able to write down

the Gaussian.

So be aware of that.

And we have seen yesterday that once we assume that the class conditional probability is

Teil einer Videoserie :

Musteranalyse/Pattern Analysis (PA)

Presenters

Prof. Dr. Joachim Hornegger

Zugänglich über

Offener Zugang

Dauer

00:43:20 Min

Aufnahmedatum

2009-04-28

Hochgeladen am

2017-07-05 12:35:04

Sprache

en-US

Tags

Per RSS abonnieren